The purpose of this project is to use techniques learned in this class to exercise exploratory data analysis on a given data set, The dataset chosen for this project are:
properties dataset contins the information about individual homes that were sold in 2016. The transaction dataset has the transaction date when the house was sold and the log error from the sales price estimated by zillow(zestimate).
## parcelid airconditioningtypeid architecturalstyletypeid
## Min. : 10711725 Min. : 1.0 Min. : 2.0
## 1st Qu.: 11643707 1st Qu.: 1.0 1st Qu.: 7.0
## Median : 12545094 Median : 1.0 Median : 7.0
## Mean : 13325858 Mean : 1.9 Mean : 7.2
## 3rd Qu.: 14097122 3rd Qu.: 1.0 3rd Qu.: 7.0
## Max. :169601949 Max. :13.0 Max. :27.0
## NA's :2173698 NA's :2979156
## basementsqft bathroomcnt bedroomcnt buildingclasstypeid
## Min. : 20.0 Min. : 0.000 Min. : 0.000 Min. :1.0
## 1st Qu.: 272.0 1st Qu.: 2.000 1st Qu.: 2.000 1st Qu.:3.0
## Median : 534.0 Median : 2.000 Median : 3.000 Median :4.0
## Mean : 646.9 Mean : 2.209 Mean : 3.089 Mean :3.7
## 3rd Qu.: 847.2 3rd Qu.: 3.000 3rd Qu.: 4.000 3rd Qu.:4.0
## Max. :8516.0 Max. :20.000 Max. :20.000 Max. :5.0
## NA's :2983589 NA's :11462 NA's :11450 NA's :2972588
## buildingqualitytypeid calculatedbathnbr decktypeid
## Min. : 1.0 Min. : 1.0 Min. :66
## 1st Qu.: 4.0 1st Qu.: 2.0 1st Qu.:66
## Median : 7.0 Median : 2.0 Median :66
## Mean : 5.8 Mean : 2.3 Mean :66
## 3rd Qu.: 7.0 3rd Qu.: 3.0 3rd Qu.:66
## Max. :12.0 Max. :20.0 Max. :66
## NA's :1046729 NA's :128912 NA's :2968121
## finishedfloor1squarefeet calculatedfinishedsquarefeet finishedsquarefeet12
## Min. : 3 Min. : 1 Min. : 1
## 1st Qu.: 1012 1st Qu.: 1213 1st Qu.: 1196
## Median : 1283 Median : 1572 Median : 1539
## Mean : 1381 Mean : 1827 Mean : 1760
## 3rd Qu.: 1615 3rd Qu.: 2136 3rd Qu.: 2070
## Max. :31303 Max. :952576 Max. :290345
## NA's :2782500 NA's :55565 NA's :276033
## finishedsquarefeet13 finishedsquarefeet15 finishedsquarefeet50
## Min. : 120 Min. : 112 Min. : 3
## 1st Qu.: 960 1st Qu.: 1694 1st Qu.: 1013
## Median :1296 Median : 2172 Median : 1284
## Mean :1179 Mean : 2739 Mean : 1389
## 3rd Qu.:1440 3rd Qu.: 2976 3rd Qu.: 1618
## Max. :2688 Max. :820242 Max. :31303
## NA's :2977545 NA's :2794419 NA's :2782500
## finishedsquarefeet6 fips fireplacecnt fullbathcnt
## Min. : 117 Min. :6037 Min. :1.0 Min. : 1.00
## 1st Qu.: 1079 1st Qu.:6037 1st Qu.:1.0 1st Qu.: 2.00
## Median : 1992 Median :6037 Median :1.0 Median : 2.00
## Mean : 2414 Mean :6048 Mean :1.2 Mean : 2.24
## 3rd Qu.: 3366 3rd Qu.:6059 3rd Qu.:1.0 3rd Qu.: 3.00
## Max. :952576 Max. :6111 Max. :9.0 Max. :20.00
## NA's :2963216 NA's :11437 NA's :2672580 NA's :128912
## garagecarcnt garagetotalsqft hashottuborspa heatingorsystemtypeid
## Min. : 0.0 Min. : 0.0 :2916203 Min. : 1
## 1st Qu.: 2.0 1st Qu.: 324.0 true: 69014 1st Qu.: 2
## Median : 2.0 Median : 441.0 Median : 2
## Mean : 1.8 Mean : 383.8 Mean : 4
## 3rd Qu.: 2.0 3rd Qu.: 494.0 3rd Qu.: 7
## Max. :25.0 Max. :7749.0 Max. :24
## NA's :2101950 NA's :2101950 NA's :1178816
## latitude longitude lotsizesquarefeet poolcnt
## Min. :33324388 Min. :-119475780 Min. : 100 Min. :1
## 1st Qu.:33827685 1st Qu.:-118392983 1st Qu.: 5688 1st Qu.:1
## Median :34008249 Median :-118172540 Median : 7000 Median :1
## Mean :34001469 Mean :-118201934 Mean : 22823 Mean :1
## 3rd Qu.:34161860 3rd Qu.:-117949468 3rd Qu.: 9898 3rd Qu.:1
## Max. :34819650 Max. :-117554316 Max. :328263808 Max. :1
## NA's :11437 NA's :11437 NA's :276099 NA's :2467683
## poolsizesum pooltypeid10 pooltypeid2 pooltypeid7
## Min. : 19.0 Min. :1 Min. :1 Min. :1
## 1st Qu.: 430.0 1st Qu.:1 1st Qu.:1 1st Qu.:1
## Median : 495.0 Median :1 Median :1 Median :1
## Mean : 519.7 Mean :1 Mean :1 Mean :1
## 3rd Qu.: 594.0 3rd Qu.:1 3rd Qu.:1 3rd Qu.:1
## Max. :17410.0 Max. :1 Max. :1 Max. :1
## NA's :2957257 NA's :2948278 NA's :2953142 NA's :2499758
## propertycountylandusecode propertylandusetypeid propertyzoningdesc
## 0100 :1153896 Min. : 31 :1006588
## 122 : 522145 1st Qu.:261 LAR1 : 275029
## 0101 : 247494 Median :261 LAR3 : 67105
## 010C : 225410 Mean :260 LARS : 54859
## 1111 : 126491 3rd Qu.:261 LBR1N : 52750
## 34 : 123249 Max. :275 LAR2 : 48808
## (Other): 586532 NA's :11437 (Other):1480078
## rawcensustractandblock regionidcity regionidcounty regionidneighborhood
## Min. :60371011 Min. : 3491 Min. :1286 Min. : 6952
## 1st Qu.:60373203 1st Qu.: 12447 1st Qu.:2061 1st Qu.: 46736
## Median :60375712 Median : 25218 Median :3101 Median :118920
## Mean :60483450 Mean : 34993 Mean :2570 Mean :193476
## 3rd Qu.:60590423 3rd Qu.: 45457 3rd Qu.:3101 3rd Qu.:274800
## Max. :61110091 Max. :396556 Max. :3101 Max. :764167
## NA's :11437 NA's :62845 NA's :11437 NA's :1828815
## regionidzip roomcnt storytypeid threequarterbathnbr
## Min. : 95982 Min. : 0.000 Min. :7 Min. :1
## 1st Qu.: 96180 1st Qu.: 0.000 1st Qu.:7 1st Qu.:1
## Median : 96377 Median : 0.000 Median :7 Median :1
## Mean : 96553 Mean : 1.475 Mean :7 Mean :1
## 3rd Qu.: 96974 3rd Qu.: 0.000 3rd Qu.:7 3rd Qu.:1
## Max. :399675 Max. :96.000 Max. :7 Max. :7
## NA's :13980 NA's :11475 NA's :2983593 NA's :2673586
## typeconstructiontypeid unitcnt yardbuildingsqft17 yardbuildingsqft26
## Min. : 4 Min. : 1.0 Min. : 10.0 Min. : 10.0
## 1st Qu.: 6 1st Qu.: 1.0 1st Qu.: 190.0 1st Qu.: 96.0
## Median : 6 Median : 1.0 Median : 270.0 Median : 168.0
## Mean : 6 Mean : 1.2 Mean : 319.8 Mean : 278.3
## 3rd Qu.: 6 3rd Qu.: 1.0 3rd Qu.: 390.0 3rd Qu.: 320.0
## Max. :13 Max. :997.0 Max. :7983.0 Max. :6141.0
## NA's :2978470 NA's :1007727 NA's :2904862 NA's :2982570
## yearbuilt numberofstories fireplaceflag structuretaxvaluedollarcnt
## Min. :1801 Min. : 1.0 :2980054 Min. : 1
## 1st Qu.:1950 1st Qu.: 1.0 true: 5163 1st Qu.: 74800
## Median :1963 Median : 1.0 Median : 122590
## Mean :1964 Mean : 1.4 Mean : 170884
## 3rd Qu.:1981 3rd Qu.: 2.0 3rd Qu.: 196889
## Max. :2015 Max. :41.0 Max. :251486000
## NA's :59928 NA's :2303148 NA's :54982
## taxvaluedollarcnt assessmentyear landtaxvaluedollarcnt taxamount
## Min. : 1 Min. :2000 Min. : 1 Min. : 1
## 1st Qu.: 179675 1st Qu.:2015 1st Qu.: 74836 1st Qu.: 2461
## Median : 306086 Median :2015 Median : 167042 Median : 3992
## Mean : 420479 Mean :2015 Mean : 252478 Mean : 5378
## 3rd Qu.: 488000 3rd Qu.:2015 3rd Qu.: 306918 3rd Qu.: 6201
## Max. :282786000 Max. :2016 Max. :90246219 Max. :3458861
## NA's :42550 NA's :11439 NA's :67733 NA's :31250
## taxdelinquencyflag taxdelinquencyyear censustractandblock
## :2928755 Min. : 0.0 Min. :-1.000e+00
## Y: 56462 1st Qu.:14.0 1st Qu.: 6.037e+13
## Median :14.0 Median : 6.038e+13
## Mean :13.9 Mean : 6.048e+13
## 3rd Qu.:15.0 3rd Qu.: 6.059e+13
## Max. :99.0 Max. : 4.830e+14
## NA's :2928753 NA's :75126
## parcelid logerror transactiondate
## Min. : 10711738 Min. :-4.60500 2016-07-29: 910
## 1st Qu.: 11559500 1st Qu.:-0.02530 2016-04-29: 902
## Median : 12547337 Median : 0.00600 2016-09-30: 894
## Mean : 12984656 Mean : 0.01146 2016-06-30: 874
## 3rd Qu.: 14227552 3rd Qu.: 0.03920 2016-05-27: 863
## Max. :162960842 Max. : 4.73700 2016-08-31: 737
## (Other) :85095
## parcelid logerror transactiondate
## 1 11016594 0.0276 2016-01-01
## 2 14366692 -0.1684 2016-01-01
## 3 12098116 -0.0040 2016-01-01
## 4 12643413 0.0218 2016-01-02
## 5 14432541 -0.0050 2016-01-02
## 6 11509835 -0.2705 2016-01-02
## parcelid airconditioningtypeid architecturalstyletypeid basementsqft
## 1 10754147 NA NA NA
## 2 10759547 NA NA NA
## 3 10843547 NA NA NA
## 4 10859147 NA NA NA
## 5 10879947 NA NA NA
## 6 10898347 NA NA NA
## bathroomcnt bedroomcnt buildingclasstypeid buildingqualitytypeid
## 1 0 0 NA NA
## 2 0 0 NA NA
## 3 0 0 NA NA
## 4 0 0 3 7
## 5 0 0 4 NA
## 6 0 0 4 7
## calculatedbathnbr decktypeid finishedfloor1squarefeet
## 1 NA NA NA
## 2 NA NA NA
## 3 NA NA NA
## 4 NA NA NA
## 5 NA NA NA
## 6 NA NA NA
## calculatedfinishedsquarefeet finishedsquarefeet12 finishedsquarefeet13
## 1 NA NA NA
## 2 NA NA NA
## 3 73026 NA NA
## 4 5068 NA NA
## 5 1776 NA NA
## 6 2400 NA NA
## finishedsquarefeet15 finishedsquarefeet50 finishedsquarefeet6 fips
## 1 NA NA NA 6037
## 2 NA NA NA 6037
## 3 73026 NA NA 6037
## 4 5068 NA NA 6037
## 5 1776 NA NA 6037
## 6 2400 NA NA 6037
## fireplacecnt fullbathcnt garagecarcnt garagetotalsqft hashottuborspa
## 1 NA NA NA NA
## 2 NA NA NA NA
## 3 NA NA NA NA
## 4 NA NA NA NA
## 5 NA NA NA NA
## 6 NA NA NA NA
## heatingorsystemtypeid latitude longitude lotsizesquarefeet poolcnt
## 1 NA 34144442 -118654084 85768 NA
## 2 NA 34140430 -118625364 4083 NA
## 3 NA 33989359 -118394633 63085 NA
## 4 NA 34148863 -118437206 7521 NA
## 5 NA 34194168 -118385816 8512 NA
## 6 NA 34171873 -118380906 2500 NA
## poolsizesum pooltypeid10 pooltypeid2 pooltypeid7 propertycountylandusecode
## 1 NA NA NA NA 010D
## 2 NA NA NA NA 0109
## 3 NA NA NA NA 1200
## 4 NA NA NA NA 1200
## 5 NA NA NA NA 1210
## 6 NA NA NA NA 1210
## propertylandusetypeid propertyzoningdesc rawcensustractandblock regionidcity
## 1 269 60378002 37688
## 2 261 LCA11* 60378001 37688
## 3 47 LAC2 60377030 51617
## 4 47 LAC2 60371412 12447
## 5 31 LAM1 60371232 12447
## 6 31 LAC4 60371252 12447
## regionidcounty regionidneighborhood regionidzip roomcnt storytypeid
## 1 3101 NA 96337 0 NA
## 2 3101 NA 96337 0 NA
## 3 3101 NA 96095 0 NA
## 4 3101 27080 96424 0 NA
## 5 3101 46795 96450 0 NA
## 6 3101 46795 96446 0 NA
## threequarterbathnbr typeconstructiontypeid unitcnt yardbuildingsqft17
## 1 NA NA NA NA
## 2 NA NA NA NA
## 3 NA NA 2 NA
## 4 NA NA NA NA
## 5 NA NA 1 NA
## 6 NA NA NA NA
## yardbuildingsqft26 yearbuilt numberofstories fireplaceflag
## 1 NA NA NA
## 2 NA NA NA
## 3 NA NA NA
## 4 NA 1948 1
## 5 NA 1947 NA
## 6 NA 1943 1
## structuretaxvaluedollarcnt taxvaluedollarcnt assessmentyear
## 1 NA 9 2015
## 2 NA 27516 2015
## 3 650756 1413387 2015
## 4 571346 1156834 2015
## 5 193796 433491 2015
## 6 176383 283315 2015
## landtaxvaluedollarcnt taxamount taxdelinquencyflag taxdelinquencyyear
## 1 9 NA NA
## 2 27516 NA NA
## 3 762631 20800.37 NA
## 4 585488 14557.57 NA
## 5 239695 5725.17 NA
## 6 106932 3661.28 NA
## censustractandblock
## 1 NA
## 2 NA
## 3 NA
## 4 NA
## 5 NA
## 6 NA
names(properties))## parcelid airconditioningtypeid architecturalstyletypeid basementsqft
## 1 0 0.7281541 0.9979697 0.9994546
## bathroomcnt bedroomcnt buildingclasstypeid buildingqualitytypeid
## 1 0.003839587 0.003835567 0.9957695 0.3506375
## calculatedbathnbr decktypeid finishedfloor1squarefeet
## 1 0.04318346 0.9942731 0.932093
## calculatedfinishedsquarefeet finishedsquarefeet12 finishedsquarefeet13
## 1 0.01861339 0.09246664 0.99743
## finishedsquarefeet15 finishedsquarefeet50 finishedsquarefeet6 fips
## 1 0.9360857 0.932093 0.99263 0.003831212
## fireplacecnt fullbathcnt garagecarcnt garagetotalsqft hashottuborspa
## 1 0.8952716 0.04318346 0.7041197 0.7041197 0
## heatingorsystemtypeid latitude longitude lotsizesquarefeet poolcnt
## 1 0.3948845 0.003831212 0.003831212 0.09248875 0.8266344
## poolsizesum pooltypeid10 pooltypeid2 pooltypeid7 propertycountylandusecode
## 1 0.9906338 0.987626 0.9892554 0.837379 0
## propertylandusetypeid propertyzoningdesc rawcensustractandblock regionidcity
## 1 0.003831212 0 0.003831212 0.02105207
## regionidcounty regionidneighborhood regionidzip roomcnt storytypeid
## 1 0.003831212 0.6126238 0.004683077 0.003843942 0.999456
## threequarterbathnbr typeconstructiontypeid unitcnt yardbuildingsqft17
## 1 0.8956086 0.9977399 0.3375724 0.9730824
## yardbuildingsqft26 yearbuilt numberofstories fireplaceflag
## 1 0.9991133 0.02007492 0.7715178 0
## structuretaxvaluedollarcnt taxvaluedollarcnt assessmentyear
## 1 0.01841809 0.01425357 0.003831882
## landtaxvaluedollarcnt taxamount taxdelinquencyflag taxdelinquencyyear
## 1 0.02268947 0.01046825 0 0.9810855
## censustractandblock
## 1 0.02516601
## feature missing_pct
## 1 parcelid 0.000000000
## 2 airconditioningtypeid 0.728154101
## 3 architecturalstyletypeid 0.997969662
## 4 basementsqft 0.999454646
## 5 bathroomcnt 0.003839587
## 6 bedroomcnt 0.003835567
## 7 buildingclasstypeid 0.995769487
## 8 buildingqualitytypeid 0.350637491
## 9 calculatedbathnbr 0.043183460
## 10 decktypeid 0.994273113
## 11 finishedfloor1squarefeet 0.932093044
## 12 calculatedfinishedsquarefeet 0.018613387
## 13 finishedsquarefeet12 0.092466645
## 14 finishedsquarefeet13 0.997430003
## 15 finishedsquarefeet15 0.936085718
## 16 finishedsquarefeet50 0.932093044
## 17 finishedsquarefeet6 0.992630017
## 18 fips 0.003831212
## 19 fireplacecnt 0.895271600
## 20 fullbathcnt 0.043183460
## 21 garagecarcnt 0.704119667
## 22 garagetotalsqft 0.704119667
## 23 hashottuborspa 0.000000000
## 24 heatingorsystemtypeid 0.394884526
## 25 latitude 0.003831212
## 26 longitude 0.003831212
## 27 lotsizesquarefeet 0.092488754
## 28 poolcnt 0.826634379
## 29 poolsizesum 0.990633847
## 30 pooltypeid10 0.987626025
## 31 pooltypeid2 0.989255387
## 32 pooltypeid7 0.837378991
## 33 propertycountylandusecode 0.000000000
## 34 propertylandusetypeid 0.003831212
## 35 propertyzoningdesc 0.000000000
## 36 rawcensustractandblock 0.003831212
## 37 regionidcity 0.021052071
## 38 regionidcounty 0.003831212
## 39 regionidneighborhood 0.612623806
## 40 regionidzip 0.004683077
## 41 roomcnt 0.003843942
## 42 storytypeid 0.999455986
## 43 threequarterbathnbr 0.895608594
## 44 typeconstructiontypeid 0.997739863
## 45 unitcnt 0.337572444
## 46 yardbuildingsqft17 0.973082359
## 47 yardbuildingsqft26 0.999113297
## 48 yearbuilt 0.020074923
## 49 numberofstories 0.771517782
## 50 fireplaceflag 0.000000000
## 51 structuretaxvaluedollarcnt 0.018418092
## 52 taxvaluedollarcnt 0.014253570
## 53 assessmentyear 0.003831882
## 54 landtaxvaluedollarcnt 0.022689473
## 55 taxamount 0.010468251
## 56 taxdelinquencyflag 0.000000000
## 57 taxdelinquencyyear 0.981085462
## 58 censustractandblock 0.025166010
## feature missing_pct
## 1 parcelid 0.000000000
## 2 airconditioningtypeid 0.728154101
## 3 bathroomcnt 0.003839587
## 4 bedroomcnt 0.003835567
## 5 buildingqualitytypeid 0.350637491
## 6 calculatedbathnbr 0.043183460
## 7 calculatedfinishedsquarefeet 0.018613387
## 8 finishedsquarefeet12 0.092466645
## 9 fips 0.003831212
## 10 fullbathcnt 0.043183460
## 11 garagecarcnt 0.704119667
## 12 garagetotalsqft 0.704119667
## 13 hashottuborspa 0.000000000
## 14 heatingorsystemtypeid 0.394884526
## 15 latitude 0.003831212
## 16 longitude 0.003831212
## 17 lotsizesquarefeet 0.092488754
## 18 propertycountylandusecode 0.000000000
## 19 propertylandusetypeid 0.003831212
## 20 propertyzoningdesc 0.000000000
## 21 rawcensustractandblock 0.003831212
## 22 regionidcity 0.021052071
## 23 regionidcounty 0.003831212
## 24 regionidneighborhood 0.612623806
## 25 regionidzip 0.004683077
## 26 roomcnt 0.003843942
## 27 unitcnt 0.337572444
## 28 yearbuilt 0.020074923
## 29 fireplaceflag 0.000000000
## 30 structuretaxvaluedollarcnt 0.018418092
## 31 taxvaluedollarcnt 0.014253570
## 32 assessmentyear 0.003831882
## 33 landtaxvaluedollarcnt 0.022689473
## 34 taxamount 0.010468251
## 35 taxdelinquencyflag 0.000000000
## 36 censustractandblock 0.025166010
## [1] "parcelid" "aircon"
## [3] "architectural_style" "area_basement"
## [5] "num_bathroom" "num_bedroom"
## [7] "framing" "quality"
## [9] "num_bathroom_calc" "deck"
## [11] "area_firstfloor_finished" "area_total_calc"
## [13] "area_live_finished" "area_liveperi_finished"
## [15] "area_total_finished" "area_unknown"
## [17] "area_base" "fips"
## [19] "num_fireplace" "num_bath"
## [21] "num_garage" "area_garage"
## [23] "flag_tub" "heating"
## [25] "latitude" "longitude"
## [27] "area_lot" "num_pool"
## [29] "area_pool" "pooltypeid10"
## [31] "pooltypeid2" "pooltypeid7"
## [33] "zoning_landuse_county" "zoning_landuse"
## [35] "zoning_property" "rawcensustractandblock"
## [37] "region_city" "region_county"
## [39] "region_neighbor" "region_zip"
## [41] "num_room" "story"
## [43] "num_75_bath" "material"
## [45] "num_unit" "area_patio"
## [47] "area_shed" "build_year"
## [49] "num_story" "flag_fireplace"
## [51] "tax_building" "tax_total"
## [53] "tax_year" "tax_land"
## [55] "tax_property" "tax_delinquency"
## [57] "tax_delinquency_year" "censustractandblock"
## parcelid logerror date year_month abs_logerror
## 1 11016594 0.0276 2016-01-01 2016-01-01 0.0276
## 2 14366692 -0.1684 2016-01-01 2016-01-01 0.1684
## 3 12098116 -0.0040 2016-01-01 2016-01-01 0.0040
## 4 12643413 0.0218 2016-01-02 2016-01-01 0.0218
## 5 14432541 -0.0050 2016-01-02 2016-01-01 0.0050
## 6 11509835 -0.2705 2016-01-02 2016-01-01 0.2705
## logerror abs_logerror
## 1 0.0276 0.0276
## 2 -0.1684 0.1684
## 3 -0.0040 0.0040
## 4 0.0218 0.0218
## 5 -0.0050 0.0050
## 6 -0.2705 0.2705
## parcelid logerror date year_month abs_logerror percentile
## 1 11016594 0.0276 2016-01-01 2016-01-01 0.0276 3
## 2 14366692 -0.1684 2016-01-01 2016-01-01 0.1684 5
## 3 12098116 -0.0040 2016-01-01 2016-01-01 0.0040 1
## 4 12643413 0.0218 2016-01-02 2016-01-01 0.0218 3
## 5 14432541 -0.0050 2016-01-02 2016-01-01 0.0050 1
## 6 11509835 -0.2705 2016-01-02 2016-01-01 0.2705 5